Host protease activity classifies pneumonia etiology

Significance Community-acquired pneumonia (CAP) is the most common infectious cause of death worldwide. In this work, we present a panel of protease-responsive nanosensors that leverage aberrant host protease activity in pneumonia to generate a urinary readout of disease. Notably, the urine signatures of host responses can also be used to differentiate between bacterial and viral pneumonia. These nanosensors constitute a possible route to diagnosing pneumonia that is orthogonal to existing clinical tests, thus opening a direction of study for pneumonia diagnostics.

across both diseases and cohorts, and each gene still retained the same distribution between diseases and controls within each data set.
Derivation of the 39-protease signature with MANATEE MANATEE or Multicohort ANalysis with AggregaTed gEne Expression is a multicohort analysis framework that is used to integrate gene expression datasets, perform differential expression analyses to filter out top genes, apply machine learning methods to arrive at a concise diagnostic signature, and finally to validate the discovered signature in independent data ( Figure   2A) (5). In this analysis, any genes that did not code for proteases were removed from all datasets.
Next, relevant datasets were identified through a systemic search of public gene expression data repositories. Some of these datasets were chosen for training the signature, and the rest were set aside as future independent validation datasets. Samples from the training datasets were then randomly split, with 70% of the samples assigned to Discovery and the other 30% assigned to Hold-out Validation. The Discovery and Hold-out Validation cohorts were each batch corrected with COCONUT conormalization.
Next, differential expression statistics were calculated in Discovery. Here, we computed four measures of differential expression between cases and controls are calculated for each protease: (1) the SAM score (from the Significance Analysis of Microarrays or SAM method (33), (2) the corresponding SAM local FDR, (3) the Benjamini-Hochberg FDR corrected P value (from running a t-test, (34) and (4) the effect size (ES). The effect size is estimated as Hedges' adjusted g, which accounts for small sample bias (6)(7)(8)(9). We also performed a leave-one-study-out (LOSO) analysis, wherein each study that accounted for at least 5% of the training samples was iteratively removed from the training set, and the differential expression statistics were re-calculated for each version of the training set with one study left out. Thus, in order for a protease to be selected, it must not only exceed the given thresholds in the statistics calculated for the full training set, but it must also exceed those thresholds for each version of the training set with one study removed.
This prevents any single study from exerting too strong of an effect on the selection of proteases. (10,11) Once the differential statistics were calculated, a set of "top" differentially expressed proteases was chosen by filtering out proteases that had an FDR of < 0.01 and an absolute effect size of > 0.6. This resulted in a 39-protease signature. The signature was first tested in Hold-out Validation to assess whether the signature's performance remained robust when tested in new data. Finally, the signature was tested in Independent Validation to measure its performance in completely independent data. The MANATEE scripts associated with this paper are publicly available and can be found at the following repository: https://github.com/Khatri-Lab/manatee_pnas.

Enrichment analysis with ConsensusPathDB
The bacterial and viral gene signatures were input into ConsensusPathDB for overrepresentation analysis. For the pathway analysis there was a minimum overlap of 3 candidates and a p-value cutoff < 0.01. The entity graph visualization was performed using the database and edges with no shared candidates between nodes were filtered out.

Recombinant substrate screens with fluorescent substrates
Quenched fluorogenic probes were synthesized by CPC Scientific (sequences in Table   S3). Each probe was diluted first in dimethylformamide (DMF), subsequently in PBS, and plated into a 384-well plate. The plates were sealed and stored at -20°C until needed. To perform the cleavage assay, recombinant proteases were activated as necessary and diluted in their respective assay buffers with 0.1% BSA. The recombinant proteases were then added to each substrate containing well for a final reaction volume of 50 µL (20 µM substrate and 20 nM recombinant protease per well). Control wells, which contained no protease, were run on the same plate. Each protease-substrate pair and relevant blank control was plated in duplicate. Cleavage over time was quantified by fluorescence as measured by a fluorimeter (Tecan Infinite M200 Pro). Fold change was calculated as the fluorescent signal at 10 minutes divided by the original fluorescence at the start of the read. All enzyme sources and buffers can be found in Table S4.

Mouse pneumonia models
All animal studies were approved by the MIT IUCAC (protocol 0619-032-44) and were conducted in compliance with institutional and national policies. 7-to 9-week-old female mice (BALB/c, Taconic) were dosed with either S. pneumoniae (NCTC 7466), K. pneumoniae (ATCC 43816), H. influenzae (ATCC 33391), pneumonia virus of mice (ATCC VR-1819), or influenza (Influenza A/PR/8/34 (H1N1), Charles River). The infectious dose for each pathogen was selected based on physical signs of infection in the mice and plated colony counts (for bacteria) (see Figure   S1). To administer the pathogens, mice were first anesthetized by isoflurane inhalation (Zoetis).
While under anesthesia, pathogens were passively inhaled via either intratracheal instillation (IT, for S. pneumoniae, K. pneumoniae, and H. influenzae) or intranasally (for PVM and Influenza A).
A volume of 50 µL was administered for all pathogens except Influenza A, which was administered at 30 µL. Age-and gender-matched control mice in each experiment received either 50 µL of sterilefiltered PBS IT for the bacterial cohorts or IN for the viral cohorts.

Pathogen preparation
To prepare the bacteria, all bacteria were first cultured overnight (37°C, shaking at 250 rpm for 14-20 hours) and subsequently grown in secondary culture with 1:100 to 1:200 dilutions to an OD600 of 0.5-0.7, corresponding to a phase of exponential growth. K. pneumoniae was cultured in LB broth (Invitrogen). S. pneumoniae was plated overnight on blood-agar plates with neomycin (Hardy Diagnostics), and subsequently cultured in liquid brain-heart infusion (BHI; BD) media. H.
influenzae was cultured in supplemented BHI (BHI with NAD and histidine-hemin). They were then pelleted, washed three times with sterile-filtered PBS and diluted to the appropriate concentration for administration. To prepare the viruses for infection, all viruses were diluted directly into sterilefiltered PBS from aliquoted stocks and kept on ice until administration.

qRT-PCR for viral loads and GZMB
Lungs were dissected from infected and healthy mice, rinsed in PBS and stored in RNAlater (Sigma Aldrich) at -80°C until use. RNA extraction was performed using the RNeasy Mini kit (Qiagen). On-column DNase digestion was performed using the RNase-Free DNase Set (Qiagen). RNA concentration was measured on a Nanodrop at A260. cDNA was prepared with the RevertAid First Strand cDNA synthesis kit (Thermo Fisher). qRT-PCR was performed using Ssofast EvaGreen Supermix (Bio-Rad). For viral load quantification, custom oligo primers for PR8 and PVM
Immediately after dosing, all mice were given a subcutaneous injection of PBS (400 µL) to promote adequate urine volumes for subsequent analysis. For the viral pneumonia models, mice were administered the ABN cocktail 6 days post infection (p.i.). For the bacterial pneumonia models, mice were administered ABNs 16 hours p.i. For all mice, after receiving the ABNs mice were returned to their home cage for one hour with full access to food and water. After this hour their bladder was manually voided, and they were transferred into a urine collection chamber. At the end of the second hour, the bladder was manually voided and the urine was collected, along with any urine that was produced in the collection chamber. The urine samples were then sent to Syneos Health for LC-MS/MS analysis. Reporter quantification by LC-MS/MS was performed as previously described (12).
At least 15 mice were infected with each pathogen, alongside at least 5 healthy control mice per pathogen. These sample sizes were chosen to ensure that each group would be greater than or equal to ten in order to properly train and test the diagnostic classifiers. After infection, mice were monitored daily. In the first cohort, one mouse from each viral model died before urine collection, and in cohort 2 several urine samples had volumes that were too low for analysis with mass spectrometry. Aside from these natural losses, no samples were excluded from analysis.
Healthy and infected mice were caged separately to prevent any possible cross-contamination.
Investigators were not blind to infection status as proper decontamination and sterility measures needed to be taken to minimize cross-contamination and ensure safety of the researchers.

Tissue dissection from mice and slide preparation
Female BALB/c mice were infected with influenza A (PR8) or S. pneumoniae (SP) as described above. PR8 and SP mice were euthanized at 6 days and 16 hours after infection initiation, respectively. The lungs were removed from the infected mice or healthy controls, and put into a 6-well plate filled with PBS while the lobes were separated. The individual lobes were then immediately embedded in optimal-cutting-temperature (OCT) compound (Sakura), frozen in isopentane chilled with dry ice, and stored at -80ºC until sectioning. Cryosectioning was performed at the Koch Institute Histology Core. The resulting slides were then stored at -80ºC until use for immunofluorescent staining or AZP experiments.

Immunofluorescent staining for immune cell markers and GZMB
Fresh-frozen slides were prepared as described above. To prepare the slides for staining,

In situ zymography with AZPs
For experiments involving on-slide AZP activation, slides were dried and fixed as previously in assay buffer (50 mM Tris, pH 7.5) for 4 hours at 37°C. Meanwhile, fresh-frozen slides with healthy lung tissue were prepared and blocked with BSA as described. After blocking, the slides were either incubated at 4C for 1 hour with the pre-cleaved BV01-Z mixture or intact BV01-Z, Cy7-polyR and DTNB diluted in assay buffer. Slides were then washed (3x5 minutes), stained with Hoechst, washed, and mounted as previously described.

Quantification of immunofluorescent staining and AZP signal
All slides were imaged on a Pannoramic 250 Flash III whole slide scanner (3DHistech).
Whole slide images were imported into QuPath (0.2.3) for quantification. Individual cells were detected using the cell detection feature on the DAPI channel. Intensity thresholds were manually determined based on mean and maximum intensity distributions for each channel, in order to classify cells as being either positive or negative for any given marker (CD8, NK, GZMB, Ly6G).
Using scripts, each cell on the slide was annotated based on whether it met the threshold, and the percentage of positive cells for each marker was calculated using Excel. For AZP quantification, the mean intensity of each cell in the AZP and polyR channels was calculated and the ratio of those values was interpreted as the relative AZP signal. All further statistical measurements were performed in GraphPad 9.0 (Prism).

Figure S1. Characterization of the mouse models for bacterial and viral pneumonia. (A,B,C)
Various doses of each bacteria were administered to immunocompetent mice. Lungs from these mice were harvested 16 hours after infection initiation, homogenized and plated to determine bacterial loads. Each point represents one mouse, n = 5 to 10 per dose. (D,F) The viral load in mice infected with pnuemonia virus of mice (PVM, 300 PFU/mouse) and influenza A (PR8, 3.8e4 EID50/mouse) was evaluated over time. Viral loads were quantified using qRT-PCR. n=3-5 mice per timepoint. (E,G) The physical manifestations of disease were tracked via body weight throughout the timecourse of infection. n=3-5 mice per timepoint. In all dose characterization graphs, green bars indicate the chosen condition (either dose or timepoint) that was used for each model. Figure S2. Specificity versus efficiency (SvE) plots can be used to identify optimal proteasesubstrate pairings. Each plot visualizes the correlation between standardized metrics that were calculated based on the fluorescence fold change at 10 minutes after incubation of the fluorescent probe with each recombinant protease. The x-axis plots the z-score of the fluorescent fold change across the screen proteases, which effectively quantifies specificity of each substrate for a given probe. The y-axis plots the z-score across the screened substrates, indicating how efficiently each probe was cleaved by a given protease. Dotted lines are plotted along each axis at a z-score of 1, to delineate hits that are one standard deviation or above the mean for each metric. The most protease with the highest of both metrics is in red, but other proteases in the upper rightmost quadrant are also considered "optimal" hits, and any pairing that scores above a 1 on either metric is labeled.  Figure S3. Immunofluorescent staining of infected lung tissue reveals differences in immune cell recruitment between S. pneumoniae and influenza. Representative images of viral influenza (PR8), bacterial S. pneumoniae (SP), and healthy lungs stained for NKp46 and CD8. Staining for each disease state was done on consecutive slides (n=2 sections/slide). (B) Based on the staining, cells were classified as either natural killer cells (NK) or CD8 T cells (CD8), and are colored green (NK) or red (CD8) if they were considered stain positive based on human determined thresholds that were kept consistent across disease states for each antibody. The colors are the result of artificial recoloring done by QuPath AF488 and Cy7 secondary antibodies, respectively. The immunofluorescent images were counterstained with DAPI (blue), which was used to automatically detect all cells present in the slide. Total cell count was used as the denominator for calculating the percentage of target staining across each slide; quantification of percent-positive cells and total cell counts can be found in Figure 5A,B.

Figure S4. Immunofluorescent staining of infected lung tissue reveals differences in protease expression and neutrophil recruitment between S. pneumoniae and influenza.
Representative images of viral influenza (PR8), bacterial S. pneumoniae (SP), and healthy lungs stained for Granzyme B (A) and RB6-8C5, a neutrophil marker that binds to Ly6G. Staining for each disease state was done on consecutive slides (n=2 sections/slide). Based on the staining, cells were classified as either (A) GZMB-expressing or (B) neutrophil-lineage, and are colored green (automatic recoloration by QuPath of AF488 secondary antibodies for each stain) if they were considered stain positive based on human determined thresholds that were kept consistent across disease states for each antibody. The immunofluorescent images were counterstained with DAPI (blue), which was used to automatically detect all cells present in the slide. Total cell count was used as the denominator for calculating the percentage of target staining across each slide; quantification of percent-positive cells and total cell counts can be found in Figure 5C,D. Red scale bars = 20 um.

Figure S5. BV01-Z is cleaved by recombinant Granzyme B and its activity is abrogated by the addition of protease inhibitors. (A)
The peptide sequence from BV01 (the ABN format) was also used to create an AZP (BV01-Z) and a fluorogenic substrate (BV01-F). (B) Staining of fresh frozen healthy lung tissue with intact probe and BV01-Z that was incubated with GZMB before being applied to tissue. Staining shows free polyR (teal) and the cleaved polyR domain (yellow). (C) Activated recombinant Granzyme B was pre-incubated with either the GZMB specific inhibitor or the protease cocktail for 3 hours at 37ºC. The resulting GZMB with and without inhibition was then incubated with BV01-F (20 uM final concentration in the well), and cleavage of BV01-F was measured over the course of 1 hour. The relative fold change in fluorescence, which reflects substrate cleavage, compared to t=0 is shown. Figure S6. BV01-Z signal in mice with influenza versus healthy controls correlates with disease state and is abrogated in the presence of a Granzyme B inhibitor. Staining of fresh frozen lung from PR8 infected mice and healthy tissue with no inhibition or in the presence of a GzmB specific inhibitor. Each image is representative of consecutive sections. The ratio of the AZP signal (yellow) to free polyR (teal) is quantified (n=2 per condition). Counterstained with DAPI (blue).  Figure S7. Classifiers can be trained using urinary reporter concentrations of GZMB specific ABNs to discriminate among disease states. (A) A binary classifier was trained using the in vivo concentrations of BV01 and BV14, which are both predicted to be indicative of GZMB activity, and tested to determine their diagnostic potential for discriminating bacterial and viral pneumonia. (B, C) Confusion matrices were then used to evaluate the performance of a multiclass SVM algorithm trained on BV01 and BV14 to diagnose pneumonia and stratify etiology. All classifiers are averages over 10 independent train-test trials. Figure S8. A subset of five ABNs can achieve high binary and multiclass classification of etiology. ROC curves and confusion matrices showing the performance of a support vector machine trained on urinary reporters from the mice in Cohort 2. Each pair of graphs shows the performance after training classifiers using the changing set of reporters listed on the left. All classifiers are averages over 10 independent train-test trials. Table S1. Discovery datasets for creating transcriptomic signatures. Cohorts of human transcriptomic data was used to train and validate a diagnostic classifier for bacterial and viral infections. Information on the source of the data and the patient cohorts that were used to train the classifier are listed. Table S2. Validation datasets of the transcriptomic signatures. Independent human datasets were used to test the diagnostic classifier. Information on the source of the data and the patient cohorts that were used to test the classifier are listed.  Table S4. Recombinant proteases and buffers used for in vitro screens. Specific buffers were used to create optimal cleavage conditions for each recombinant protease. Activation buffers were used for pre-incubation of the protease as needed.